End-to-End DNN Training with Block Floating Point Arithmetic

نویسندگان

  • Mario Drumond
  • Tao Lin
  • Martin Jaggi
  • Babak Falsafi
چکیده

DNNs are ubiquitous datacenter workloads, requiring orders of magnitude more computing power from servers than traditional workloads. As such, datacenter operators are forced to adopt domain-specific accelerators that employ halfprecision floating-point (FP) numeric representations to improve arithmetic density. Unfortunately, even these representations are not dense enough, and are, therefore, sub-optimal for DNNs. We propose a hybrid approach that employs dense block floating-point (BFP) arithmetic on dot product computations and FP arithmetic elsewhere. While using BFP improves the performance of dot product operations, that compose most of DNN computations, allowing values to freely float between dot product operations leads to a better choice of tensor exponents when converting values to back BFP. We show that models trained with hybrid BFP-FP arithmetic either match or outperform their FP32 counterparts, leading to more compact models and denser arithmetic in computing platforms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Realization of Digital Filters Using Block- Floating-point Arithmetic

Recently, statistical models for the effects of roundoff noise in fixedpoint and floating-point realizations of digital filters have been proposed and verified, and a comparison between these realizations presented. In this paper a structure for implementing digital filters using block-floating-point arithmetic is proposed and a statistical analysis of the effects of roundoff noise is carried o...

متن کامل

Supercharge Your DSP with Ultra-Fast Floating-Point FFTs

Engineers targeting DSP to FPGAs have traditionally used fixed-point arithmetic, mainly because of the high cost associated with implementing floating-point arithmetic. That cost comes in the form of increased circuit complexity and often degraded maximum clock performance. Certain applications demand the dynamic range offered by floating-point hardware but require speeds and circuit sizes usua...

متن کامل

FPGA Based Quadruple Precision Floating Point Arithmetic for Scientific Computations

In this project we explore the capability and flexibility of FPGA solutions in a sense to accelerate scientific computing applications which require very high precision arithmetic, based on IEEE 754 standard 128-bit floating-point number representations. Field Programmable Gate Arrays (FPGA) is increasingly being used to design high end computationally intense microprocessors capable of handlin...

متن کامل

A Finite Precision Block Floating Point Treatment to Direct Form, Cascaded and Parallel FIR Digital Filters

This paper proposes an efficient finite precision block floating FIR filters when realized in finite precision arithmetic with point (BFP) treatment to the fixed coefficient finite impulse response (FIR) two widely known data formats, namely, fixed-point (FxP) digital filter. The treatment includes effective implementation of all the three and floating-point (FP) representation systems, by inve...

متن کامل

On Speeding up the Deep Neural Network Based Speech Recognition Systems

Recently, the deep neural network (DNN) as acoustic model has been successfully applied to large vocabulary continuous speech recognition (LVCSR) tasks, e.g. a relative word error reduction of around 20% can be achieved compared to a state-of-the-art discriminatively trained Gaussian Mixture Model (GMM). However, due to the huge number of parameters in the DNN, real-time decoding is a bottlenec...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018